. Motivation 

. High-Level Overview: 

. High-Level Overview (Part 2) 
. Low-Level Overview 

. Conclusion 


um BWN Re 


Motivation 
Motivation for Red Cup Replacement 


Motivation 

Young adults today face a unique problem. Like generations before, they 
experiment with alcohol and other less respected substances. Like 
generations before, they often act very irresponsibly while under the 
influence. However, unlike generations before, today’s youth have cheap 
and plentiful digital cameras with which they can immortalize their rash 
actions. Additionally, Facebook and other online services have made it 
easier then ever before to publicly share these visual memories within 
seconds. 


Unfortunately, once posted, it is sometimes hard to control who can see 
these pictures. Many parents look through their children’s facebook photos. 
Employers search for prospective hires online. Even if these unwanted 
viewers do not closely examine every picture, they can easily scroll through 
a series of thumbnails to scan for incriminating photos. 


Additionally, these irresponsible pictures often stand out to viewers. There 
are several distinct objects which oftentimes are found in such pictures. In 
particular, the red solo cup, beverage container of choice at many parties, 
has a memorable color and shape. A picture of a party can often be 
identified at a glance, simply because of the presence of red solo cups. 


This leaves the modern young adult in an unenviable position. They desire 
to keep and share memories with their friends, whilst not incriminating 
themselves to the unintended viewer scrolling through their photos. This is 
a large problem, but there is a solution. If the photos could be easily edited 
to appear “clean,” the young adult could upload them without fear. That is 
the intent of our project: Red Cup Replacement. 


High-Level Overview: 
High-Level Overview: 


The goal of our project was to write a function that allowed students to "fix" 
pictures before uploading them to Facebook. Due to the time constraints of 
the project, we decided to define "fix" as altering a picture so that if one 
casually glanced at it, they would not be able to tell that it had been 
changed. Below are two figures that demonstrate this casual glance test. 


Figure 1: 


Figure 2: 


It is visually obvious that Figure 1 has been "fixed" to produce Figure 2. 
Looking closely at Figure 2 alone, it is also blatant that many of the "fixed" 
objects look slightly strange and out of place. This imperfection is intended. 
Our goal was not to make the "fixed" picture look perfect, but to make it 
look passable if viewed briefly when scrolling through countless Facebook 
pictures. This way, if a student’s relative or future employer finds their 
Facebook page, they will not find any suspect pictures while scanning over 
the thousands of tagged photographs. In addition to this casual glance 
limitation, we also decided to only remove red cups from pictures. There 
are simply too many suspect items that college students can be holding 
these days, and identifying them all would be absolutely impossible. Thus 
we decided to only identify red solo cups as a “proof of concept.” This 
decision was made because red cups are fairly easy to identify with their red 
base and white top. 


The function we used to “fix” pictures can be broken down into two key 
processes: first we identify the red cups in the picture, and then we replace 
them with more appropriate items. 


Identification: 


We start the identification process using a process called template matching. 
First, we create a bank of template images that we wish to use to identify 
red solo cups. This image bank contains a variety of red solo cups with 


different backgrounds, angles, hands holding them, etc. The reason for such 
a broad range of templates will be explained later in this module. Then, one 
by one, we use each template image to identify red solo cups in the target 
image. This is done by shifting the template across the target image, and 
computing the sum of products at each position. This sum of products, 
otherwise known as the cross-correlation, between the template and the 
selected portion of the target image represents how closely the template and 
the selected portion match. This process is shown in Figure 3. 


Figure 3: 


If the calculated correlation is above a certain threshold, we save the 
location and size of the template and identify this position as a red solo cup. 
This process is shown is Figure 4. Once a template image has been shifted 
across all possible positions in the target image, the size of the target image 
is reduced and the shifting starts over. This re-scaling method is repeated 
until the template is the same size of the target image. The reason for re- 
sizing will be discussed later in this module. 


Figure 4: 


High-Level Overview (Part 2) 


Replacement 

Once all of the cups have been identified the replacement process begins by 
breaking up the area surrounding each cup into small boxes and examining 
each one individually. We then cycle through a bank of re-sized possible 
replacement images, breaking the corresponding area into the same sized 
chunks. We calculate the cross correlation between each box, giving us an 
idea as to how well the replacement image matches the area surrounding the 
cup in the actual picture. In order for the final product to look cohesive, the 
replacement image needs to “fit” the area surrounding the hole left by the 
cup. The correlation between the cup itself and the object in the 
replacement image are inconsequential because the object will completely 
replace the cup after the replacement. Thus, only the correlation of the 
surrounding areas determine the selection of our replacement image. 


The correlation of each pair of boxes is calculated in the same way as in the 
identification process, by taking the sum of products of each pixel. Once 
every pair of boxes has had its correlation calculated, the best match in the 
replacement bank is determined, and is placed in the position the detected 
red cup occupies. The best match does not necessarily have the most boxes 
with the highest correlation, or have the highest total correlation. Instead, 
some boxes are given a greater weight based on results from running an 
edge detector on each box. If the box contains more edges, it is given a 
greater weight. This is because it is most important to match edges when 
replacing the red solo cup, particularly hands. It is more obvious when 


edges do not match, and therefore edges are given a higher priority when 
selecting a replacement image. The replacement bank contains pictures of 
objects being held in hands in different orientations, so this methodology of 
weighting the boxes differently should lead to the most visually pleasing 
replacement possible. 


The surrounding areas whose correlation was calculated are blurred and 
combined with the area surrounding the hole using a simple linear intensity 
blend to further facilitate blending of the replacement image with the image 
being filtered. 


Low-Level Overview 


Algorithmically our red cup replacement algorithm breaks down into three 

main sections: cup identification, finding a suitable replacement image, and 
the merger of the found image into the original. Each part presents its own 
technical challenges and solutions. 


Identification 


Our test identification algorithm is based on simple template matching. 
Basically, the template image of a desired object is convolved with the 
original image and the correlation between the two is found at every point. 
The correlation is then normalized with respect to the intensity of the 
original image, giving a correlation value in the range between -1 and 1. 
This process is encapsulated in the matlab function normxcorr2, whith 
takes two grayscale image matrices and returns one correlation matrix 
whose width and height are the sum of the widths and heights of the 
original matrices. 


The program sets a threshold value (around .7 by experimentation) to 
determine if our template has matched a cup in the original image. Each 
color channel runs and is compared with the threshold separately. The 
program then ands the resulting filtered correlation matrices together so a 
match is only found if it matches in terms of red, green, and blue. This 
prevents a red (100% red, 0% green, 0% blue) from matching with white 
(100% red, 100% green, 100% blue). At this stage, all points that exceed 
the threshold are considered matches. Inorder to find the actual location of 
the cup the algorithm finds the maximum correlation overall, records a cup 
at that location, and then masks out the area of the found cup. This 
neutralizes the other over threshold points around corresponding to the 
same cup, preventing overlapping cup hits. The algorithm then finds the 
next greatest maximum value and repeats until all points over threshold 
have been accounted. 


Unfortunately, this approach only works for one size of cup in the source 
image (the size of the template). To detect all cup sizes the scale of the 
template relative to the source image must change and the correlation must 


be run for each respective size. Our algorithm scales down the original 
image using imresize and leaves the template small (to save on runtime by 
reducing the correlation size instead of increasing it). After each small 
change in size the correlation function runs and saves matched regions to an 
accumulation array. The function also keeps track of the masks of previous 
match regions so smaller cups aren’t found erroneously inside of larger 
cups. The match regions are recorded at the scale of the original image, so 
the algorithm keeps track of the scale factor at each step and sizes the 
recorded region accordingly. 


Search 


The search algorithm builds on the idea of template matching and expands 
it to a wider scope. Ideally the program would exactly match the regions 
around each cup and ignore the cup itself. Since our correlation function 
can not exclude the middle area, we had to use a different approach. The 
replacement algorithm generates blocks around the found cup with a width 
proportional to the size of the cup to be replaced. Each individual block is 
then correlated through the image bank (similarly to as explained above). 
The main difference is that the search algorithm must consider all blocks 
simultaneously-- a match is only a match if it works all the way around the 
suspect region. To achieve this, the correlation matrices for each block are 
shifted and merged by the displacement of the block from the origin of the 
replacement image. This generates a correlation matrix that takes all blocks 
into account. The algorithm then finds the region with the highest 
correlation from all the images, and passes that region to the merge 
algorithm. 


We built our test image bank from a relatively small number of images and 
just used Matlab’s imread function to load each one serially. The program 
runs the above block based correlation on each image, keeping track of the 
highest correlation value and its assosciated region. 


Because of the block nature of the search algorithm, one simple 
improvement we made was to give the blocks different weights based on 
their importance to the continuity of the image. The human eye sees lines 
and edges more than muted textures, so we gave more weight (by 


multipying their correlation matrices by a factor before the final correlation 
sum) to blocks that contained more edges. This modification helped ensure 
that arms stayed continuous and helped with the hand problem (the frequent 
presence of hands over the cup). 


Combination 


After finding a region to suitably replace the excised region from the 
original image, the new image is blended with the original. We used a 
conditional blend to completely replace the red cup, and then gradually 
blended the surrounding buffer regions together with the original image. 
Our blend algorithm used a linear intensity blend (scaled sum of the two 
images), but could be quickly improved with bicubic blur (taking blur 
information from above and below as well) and a more consistent merger 
(angled corners). 


Conclusion 


Red cup detection and replacement is an application of image recognition 
technology that could be viable today. Our demonstration shows that the 
technology has promise, but there are definitely still large technical hurdles 
to be overcome. Our template matching based cup recognition program 
ignores important issues like partially obscured cups, rotated cups, or 
deformed (crushed) cups. Template matching is ill suited to deal with these 
cases, but a “professional” version of the algorithm could use a combination 
of computer vision techniques (edge detection/ shape detection, color blob 
detection, size transformation, multiple templates, etc.) to improve both 
accuracy and runtime. 


On the replacement end of things, the program could be vastly improved by 
a more structured and complete replacement image bank. Consistent sizing 
of the bank images (so the algorithm knows the whole size of the image is 
the whole size of the replacement region) would improve accuracy and 
speed up runtime by an entire polynomial order. While we avoided the 
question of runtime in our demonstration, any practical implementations 
would need to run much more efficiently. Red cup detection seems 
particularly suited to a smartphone application, allowing users to sanitize 
pictures as they are taken. Yet in its current form the algorithm would be 
prohibitively costly in terms of both time and battery life in a mobile 
environment. Offloading the image bank search to the cloud could greatly 
improve the load on the phone since the cup detection and meshing parts of 
the algorithm are relatively efficient. The phone could find suspect regions 
in the image, offload the detected regions to the cloud where a server could 
quickly search a large and organized image bank, and then locally re- 
integrate the replaced regions to alleviate privacy concerns and keep the 
whole image in the user’s possession. 


Our process leaves lots of room to improve the final image mesh step too. 
A better and more natural blend (bicubic instead of linear blending, plus 
blended corners) would make the final results look much more realistic. 
The blending algorithm could use edge detection to keep features 
continuous, avoiding blur around sharp edges that would be obvious to the 
viewer in the final image. Matching lighting and highlight intensity with 


the original image would also greatly improve the quality of the final 
product. 


Additionally, we could implement different replacement algorithms 
depending on characteristics of the area surrounding the cups. A technique 
known as quilting works well for patterns, but not for unique objects such 
as hands. Quilting avoids many of the challenges posed by using a 
replacement image bank, because it uses other areas of the original image to 
“quilt” over the hole left by the cup. This means that quilting could be 
applied to a hole left by any image, eliminating the need for a huge bank of 
replacement images. Ideally, our program could choose the better 
replacement method based upon characteristics of the surrounding image. 


On an even grander scale, the algorithm could be generalized to allow users 
to select and replace arbitrary offending items in an image (houses on a 
scenic hillside, aggie caps at a longhorns game, etc.). This presents new 
challenges in detection and in the construction of the image bank, but it 
could foreseeably work using the resources of large image databases like 
Flickr or the Google Images cache. 


Ultimately, our algorithm is only a proof of concept. Future researchers 
could focus on improving many things from the accuracy of detection to the 
seamlessness of reintegration to the accuracy and computational complexity 
of an image bank search. The distinctive shape and color of red cups lend 
them well to computer detection and replacement, but the algorithms 
principles could be generalized to other objects. There is a long way to go 
to make a seamless professional implementation of the technology, but all 
the pieces exist with today. 


